Podcastle: Improvements of Speech Recognition by Using Acoustic Modeling Based on Wisdom of Crowds

نویسندگان

Jun Ogata

Masataka Goto

چکیده

1 はじめに我々は,ポッドキャストを音声認識によって自動的にテキスト化することで,それらをユーザが全文検索できるだけではなく,詳細な閲覧, 編集も可能なソーシャルアノテーションシステム「PodCastle1)2)3)」の開発,運営を行っている. ポッドキャストは実環境の多様な音声データであり,従来の音声認識技術では高い認識率を達成することは難しい.そこで PodCastleでは,多数のユーザに認識誤りを訂正 (アノテーション)する協力をしてもらうことで,音声認識率をシステムの運用中に向上させる枠組みを採用している.こうすることで,検索サービスとしての質を向上させるだけでなく,音声認識技術の底上げをはかることも狙っている. 本研究では,上記の枠組みの一環として,PodCastleを通じて得られる集合知,すなわちユーザによる音声認識誤りの訂正結果を活用した音響モデル学習について検討し,ポッドキャスト音声認識において評価を行う.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription

This paper presents acoustic-model-training techniques for improving automatic transcription of podcasts. A typical approach for acoustic modeling is to create a task-specific corpus including hundreds (or even thousands) of hours of speech data and their accurate transcriptions. This approach, however, is impractical in podcast-transcription task because manual generation of the transcriptions...

متن کامل

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Podcastle: Improvements of Speech Recognition by Using Acoustic Modeling Based on Wisdom of Crowds

نویسندگان

چکیده

منابع مشابه

Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

Allophone-based acoustic modeling for Persian phoneme recognition

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

عنوان ژورنال:

اشتراک گذاری